Introduction

In this report, I will analyze the Seattle library checkouts of works by Jane Austen. Despite the time, Jane Austen continues to be a prevalent author today. Many cherish and admire Jane Austen’s novels (me included). Her stories have been translated into several languages and have inspired multiple adaptations, documentaries, and retellings. Jane Austen has various works, including short stories and, of course, her novels. For this analysis, I decided to focus on her six novels: Sense and Sensibility (1811), Pride and Prejudice (1813), Mansfield Park (1814), Persuasion (1817), and Northanger Abbey (1817). I excluded Sandition because although it was published posthumously, like Persuasion and Northanger Abbey, it was uncompleted at the time of her death.

I used data from the SPL Open Data portal and filtered out all the checkout entries for all materials with Jane Austen as the creator (this included entries that contained “Austen, Jane”). In addition, I included all the years possible to see the popularity trends over time for each novel. I did not exclude any material type but rather had all the different types, including ebooks, audiobooks, sound disks, and more. I wanted to understand how famous each story was as a whole; thus, I tried to capture the data from different forms. Furthermore, I was interested in discovering how the material type may have changed over the years.

Summary Information

The data set revealed exciting details. For example, the data disclosed that in 2022, people checked out Jane Austen novels the most with 6081 checkouts. This is a fascinating fact to process and shows the relevance of classics in today’s age. I then took a step back to analyze the total checkouts from 2006 to 2023 and find which novel was the most popular. As expected, Pride and Prejudice was checked out the most over the years. The novel (in different forms from physical to digital) was checked out a total of 16806 times! At the same time, Mansfield Park was the least popular, only checked out 3223 times over the years.

The Dataset

The data was collected by the Seattle Public Library and published by the City of Seattle on its open data portal. The dataset captures monthly counts by the title of all checkouts from 2005 to the present (2023). The primary data set contains slightly over 42 million rows and 12 variables, which include: usage class (physical or digital), checkout type (vendor tool used to check out the item), material type (e.g., book, movie, music, etc.), check out year and month, number of checkouts, title, ISBN, creator/author, subjects, publisher, and publication year.

According to the City of Seattle, the data comes from several current and historical sources. Information about digital item checkouts comes from the media vendors, which include Overdrive, hoopla, Freegal, and RBDigital. However, there are two primary sources for physical items: the Legrady artwork data archives (for works older than 2016) and Horizon ILS (2016 to present).

Libraries have a history of keeping records private to protect their members. However, the Seattle Public Library is part of Seattle’s Open Data Program, a program dedicated to making data collected by the City of Seattle available to increase transparency, accountability, research, and more. Therefore, one ethical question to consider is whether the privacy of Seattle Public Library members is protected.

Some possible limitation or issue with the data is how “messy” it is. For example, title names differed between different material types of the same title. The book Pride and Prejudice has inconsistent title names based on their material type and publisher. Some copies contained an introductionove or annotation, which was mentioned in the title. Another issue was the inclusion of novels not by Austen. Jane Austen has two stories that inspired parody novels (Pride and Prejudice and Zombies and Sense and Sensibility and Sea Monsters). The dataset included them because they listed Jane Austen as the creator; thus, I had to ensure to eliminate checkouts from these books.

Total Checkouts of Each Novel

First, I wanted to observe the total checkouts of each novel. Thus, I grouped the dataset by the date and title of the book to find the monthly sum of checkouts and created a line chart. The graph shows interesting revelations. As we already know, Pride and Prejudice was the top novel checked out in the library for the majority of the time over the years. However, the line chart provides a better perception of the novel’s popularity. We see a notable gap between Pride and Prejudice and the other titles. But we also see significant spikes in the past two years. The pandemic and books may be substantial contributors behind the increase in not only Pride and Prejudice but other titles as well. We see that Persuasion surpassed Pride and Prejudice at one point and reached its highest point on July 2022. From prior knowledge, I connected this to the Netflix adaptation of Persuasion which highly influenced this influx of checkouts. When researching further, I discovered that around this time, TikTok also started its book club and started with Persuasion. We also see this happening to Emma, which was recently adapted into a movie.

Your Choice

The last chart is up to you. It could be a line plot, scatter plot, histogram, bar plot, stacked bar plot, and more. Here are some requirements to help guide your design:

Here’s an example of how to run an R script inside an RMarkdown file: